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PstI 

TAACTGCAG ATG GCA MC ATT TCT GTC GTT GCT GCT GCA CTA CTG GTC 
Met Ala Asn lie Ser Val Val Ala Ala Ala Leu Leu Val 
1 5 10 



48 



TTG CTG GTC TTG GGT CAT GCC ACT GCA AGC ATC TAC AGG ACA GIT GTC 96 
Leu Leu Val Leu Gly His Ala Thr Ala Ser He Tyr Arg Thr val Val 
IS 20 25 

GAG TTT GAA GAG GAT GAT GCC ACC AAC CCA ATA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys Met Arg 
30 35 40 45 

AAA TGC AGA AAG GAG TTC GAG AAG GAA CAA ATG TTG AGA GCT TGC CAA 192 
Lys cys Arg Lys Glu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
SO 55 60 65 

CAA TGG TTG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TTT CAC 240 
Gin Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
70 75 80 85 

TTT GAA GAT GAC ATG GAG AAT CCT CAA GGA CCA CAG CAG AGA CCT CCT 288 
Phe Glu Asp Asp Met Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 

CTC CTT CAG AAG TGC TCT GAG CAA CTC AAA CAG ATG CAA TCT CAG TGT 336 
Leu Leu Gin Lys cys Cys Glu Gin Leu Lys Gin Met Gin Ser Gin Cys 
105 110 115 

GTT TGC CCA ACC CTT AAA GGT GCC AGC AAA GCT GTC AAA CAG GAA GAG 384 
Val cys Pro Thr Leu Lys Gly Ala Ser Lys Ala Val Lys Gin Glu Glu 
120 125 130 

CAG CAA CAA GGC CAG CAA CAA GGT AAG CAG CAG ATG GTT AGG AAG ATC 432 
Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Met Val Arg Lys He 
135 140 145 

TAT AAG ACT GCC AAA CAC CTT CCT AAA GTC TGT GAC ATT CCA CAG GTT 480 
Tyr Lys Thr Ala Lys His Leu Pro Lys Val Cys Asp He Pro Gin Val 
150 155 160 165 

EcoRI 

GAT GTA TGC CCA TTT CAG AAG ACC ATC CCT GGG CCC TCA TAC TAGAATT 529 
Asp Val Cys Pro Phe Gin Lys Thr Met Pro Gly Pro Ser Tyr ••* 
170 175 
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(57) Abstract 

Methods which allow for nutritional improvement 
of plants and plant tissue by increasing the amount of a 
selected amino acid(s) in a seed storage protein involve al- 
tering a naturally-occurring seed storage protein gene. Ol- 
igonucleotides coding for the protein are assembled by 
use of overlapping synthesized DNA sequences. 



psti 

TAACTGCAG ATG GCA AAC ATT TCT GTG GTT GCT OCT GCA CTA CTG GTC 
Met Ala Asn He Ser Val Val Ala Ala Ala Leu Leu val 

1 5 10 



48 



TTG CTG GTG ITS GGT CAT GCC ACT GCA AGC ATC TAC AGG ACA GTT CTG 96 
Leu Leu Val Leu Gly His Ala The Ala Ser He Tyr Arg Thr val Val 
15 20 25 

GAG TTT GAA GAG GAT GAT GCC ACC AAC CCA ATA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys net Arg 
30 35 40 4S 

AAA TGC AGA AAG GAG TTC GAG AAG GAA CAA ATG TTG ACA GCT TGC CAA 192 
Lys cys Arg Lys Glu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
50 55 60 65 

CAA TOG TTG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TTT CAC 240 
Gin Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
70 75 80 65 

TTT GAA GAT GAC ATG GAG AAT CCT CAA GGA CCA CAG CAG AGA CCT CCT 288 
Phe Glu Asp Asp net Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 

CTC CCT CAG AAG TGC TGT GAG CAA CTC AAA CAG ATG CAA TCT CAG TGT 336 
Leu Leu Gin Lys Cys Cys Glu Gin Leu Lys Gin net Gin Ser Gin Cys 
105 110 115 

GTT TGC CCA ACC CTT AAA GGT GCC AGC AAA GCT GTG AAA CAG GAA GAG 384 
val Cys Pro Thr Leu Lys Gly Ala Ser Lys Ala Val Lys Gin Glu Glu 
120 125 130 

CAG CAA CAA GGC CAG CAA CAA GGT AAG CAG CAG ATG GTT AGG AAG ATC 432 
Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Met Val Arg Lys He 
135 140 145 

TAT AAG ACT GCC AAA CAC CTT CCT AAA GTC TGT GAC ATT CCA CAG GTT 480 
Tyr Lys Thr Ala Lys His Leu Pro Lys val Cys Asp He Pro Gin Val 
150 155 160 165 

EeoAI 

GAT CTA TGC CCA TTT CAG AAG ACC ATG CCT GGG CCC TCA TAC TAGAATT 529 
Asp val Cys Pro Phe Gin Lys Thr net Pro Gly Pro Ser Tyr *** 
170 175 
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PROCESS FOR ENHANCING THE CONTENT OF A SELECTED AMINO ACID 
IN A SEED STORAGE PROTEIN 

5 

Technical Field 

The present invention relates to methods of producing 
10 transgenic plants having an increased content of selected 
amino acids in modified seed storage proteins and, more 
particularly, to methods of making an improved seed storage 
protein. 

15 Background of the Invention 

Greater recognition of the role of plants in supplying 
essential amino acids to the animal world has led to 
emphasis on the development of new food plants that have 

20 proteins that are better balanced for human and animal 
nutrition. Classical plant breeding techniques have limi- 
tations for achieving this goal. Molecular genetics, how- 
ever, shows potential for overcoming these limitations. 

Seed storage proteins represent up to 90% of total seed 

25 protein in many plant seeds. Shotwell and Larkins (1989) 
In: The Biochemistry of Plants Vol. 15 (Academic Press, San 
Diego: Stumpf and Conn, eds.) Chapter 7: 29. These 
naturally-occurring proteins are used as a source of 
nutrition for young seedlings for the growth period just 

30 following germination. The genes encoding them are strictly 
regulated, being expressed in a highly tissue-specific and 
developmentally stage-specific fashion. Walling, et al. 
(1986) Proc. Natl. Acad. Sci. 83, 2123-2127; Higgins, T.J.V. 
(1984 ) Ann. Rev. Plant Physiol. 35, 191-221 . Thus they are 

35 expressed almost exclusively in developing seed, and 
different classes of seed storage proteins may be expressed 
at different stages in the development of the seed. 

The expression of foreign genes in plants is well 



WO 94/10315 A PCT/US93/10090 

established. D Blaere, et al . (1987) Methods in Enzymology 
153, 277. Seed storage protein genes have been transferred 
to other plants. Okamura, et al . ( 1986) Proc. Natl. Acad. 
Sci . 83, 8240; Sengupta-Gopalan, et al. (1985) Proc. Natl. 
\ 5 Acad. Sci. 82, 3320; Higgins, et al. ( 1988) Plant Mol. Biol. 

J 11, 683; Ellis, et al. (1988) Plant Mol. Biol. 10, 203; 

I Barker, et al. (1988) Proc. Natl. Acad. Sci. 85, 458 ? 

i Vandekerckhove, et al. (1989) Bio/Technol. 7, 929; and 

I Altenbach, et al. (1989) Plant Mol. Biol. 13, 513. In most 

10 of these cases it was shown that within its new environment, 
J the transferred seed storage protein gene is expressed in a 

•| tissue-specific and developmentally regulated manner. 

$ Beachy, et al. (1985) EMBO J . 4 , 3047. The expression 

levels varied, but reached as high as 8% of the total seed 
15 protein. Altenbach, et al., supra ; Voelker, et al. (1989) 
Plant Cell 1, 95. 

However, design of a synthetic seed storage protein 
requires more than mere substitution of the desired amino 
|j acid for naturally-occurring amino acids in the target 

g» 20 protein. Criteria must be defined for maximizing the 

potential of success and the ultimate expression of the gene 
; in the targeted host plant. Even selection of the class of 

storage proteins least likely to present difficulties is 
|j important, and is dependent on the availability of sequence 

25 data for that class of proteins, the relative gene size 
within that class, and the degree of processing and post- 
translational modification necessary for deposition. Seed 
storage proteins are nominally classified by density 
gradient sedimentation values: 2S, 7S, and IIS. Although 
30 the 7S and IIS proteins tend to be one general type per 
sedimentation value, the 2S seed proteins are a diverse 
. group. The 2S sedimentation value implies a relatively low 

* molecular weight, and the 2S proteins include classic 

storage proteins as well as lectins, protease inhibitors, 
| 35 and others. The 2S storage proteins appear to be less 

restricted in amino acid composition than 7S and lis 
proteins, and include species which are relatively rich in 
basic amino acids. Additionally, the 2S storage proteins 
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are encoded on small genes, making the prospect of 
synthesizing a new 2S gene from oligonucleotides attractive. 

Among published seed protein sequence data, no protein 
incorporating a non-limiting amount of lysine has been 
5 identifi d. Lysine comprises from 3 to 7% of the total 
amino acids in known seed protein sequences. It is 
estimated that a protein containing 10 to 15% lysine, 
expressed transgenically at a level of 2 to 5%, is necessary 
to cause a noticeable increase in seed deposition of lysine. 

10 No storage protein-coding sequence which meets this 
criterion is known. 

Storage proteins can be modified by incorporating 
inserts containing one or more selected amino acids such as 
lysine, resulting in a lysine-rich polypeptide that can be 

15 transferred into plant cells. Or, following the design of a 
storage protein with a known sequence, a lysine-rich poly- 
peptide can be synthesized by substitution of specific amino 
acids and transferred into a host cell. 

There is a recognized need for lysine-rich seed storage 

20 proteins and for an efficient, accurate method of producing 
the same. Further, there is also a recognized need for a 
method to produce a DNA or cDNA sequence that codes for an 
increased amount of any essential amino acid that can be 
expressed transgenically as a seed storage protein. A DNA 

25 "coding sequence" is a DNA sequence which is transcribed and 
translated into a polypeptide in vivo when placed under the 
control of appropriate regulatory sequences. The boundaries 
of the coding sequence are determined by the ATG start codon 
at the 3' terminus. Examples of coding sequences include 

30 cDNA, genomic DNA sequences from cells, and synthetic DNA 
sequences'. 

When designing sequences to be rich in certain amino 
acids, care must be taken that the substitutions with the 
selected amino acids does not influence the stability of the 
35 modified 2S protein. Certain insertions, such as, long 
stretches of particular amino acids, may result in shapes 
and turns which would cause instability, poor expression, or 
poor accumulation due to disruption in normal folding 
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patterns of the protein. In addition, replacement must be 
conservative in that hydrophobic amino acids and those 
giving charge and polarity are not substituted so that the 
overall structure and stability of the molecule will not be 
5 adversely affected. Polarity and direction are due to 
acidic (negative) and basic (positive) charges on the amino 
acid residues. 

To synthesize DNA molecules, the two complementary 
strands are constructed separately because only single- 
10 stranded DNA (oligonucleotides) can be synthesized. These 
are then hybridized (by formation of hydrogen bonds) and 
linked to larger DNA units by enzymatic coupling in order to 
construct genes or their regulatory units. A gene is a DNA 
sequence responsible for the production of polypeptides. It 
15 is now possible, given the variousDNA recombination tech- 
niques, to construct any given gene, whether synthetic or 
natural, to reproduce it, and to convert it into polypep- 
tides using whole cell systems. 

Oligonucleotides are polymers built up by the polycon- 
20 densation of nucleoside phosphates. In the past, the 
majority of synthetic genes have been assembled using 
complementary oligonucleotides which represent both entire 
strands. Gapped fill-ins refer to the pairing of 
complementary nucleotides along sections of DNA where 
25 pairing is incomplete (single-stranded sections) to form 
complementary DNA strands for those segments. Gapped fill- 
ins have been published only for single pairs of overlapping 
oligonucleotides, which limits the length of the target 
molecule. Thus, construction of long synthetic sequences 
30 required subcloning (moving a sequence from one vector to 
another to produce copies) and/or pasting together of 
regions via restriction sites. The only method utilizing an 
overlap extension procedure requires splicing of double- 
stranded gene fragments. Horton, et al. (1989) Gene 77, 61. 
35 The sequential extension method presented by this 

invention obviates the subcloning requirement, and allows 
simple, one day assembly of larger gene regions. This 
method is approximately 30% more cost-effective even without 
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consideration of personnel time than the usual method of 
assembling complete complementary oligonucleotides because 
it allows enzymatic synthesis of gap regions. Khorana 
(1968) Pure Appl . Chem. 17, 349. A more recent publication 
5 offers similar cost savings by incorporation of a terminal 
3' hairpin structure to prime synthesis of the second 
strand. However that method is limited by the length of 
oligonucleotides. Ulhmann, et al. (1988) Gene 71, 29. 
Another method utilizes short overlap regions to prime 
10 polymerase, but both of these methods rely on ligation of 
separate double-stranded regions for assembly. Rink, et al. 
(1984) Nucleic Acid Res. 12, 16; Rossi, et al. (1982) 
Biol. Chem. 257, 9226. A third method relies on in vivo gap 
repair, and requires that one strand of synthetic DMA be 
15 complete, though it may contain nicks bridged by short 
oligonucleotides of the opposite strand. It has only been 
used to assemble a 270 bp fragment. Adams, et al. (1989) 
Nucleic Acid Res. 16, 4287. 

The advantages of this invention are: (a) it is cost 
20 effective because fewer oligonucleotides are required and 
less time is spent in oligonucleotide preparation because 
crude oligonucleotides work well; (b) it is a simple two- 
reaction (extension/amplification) procedure that is 
complete in 1-2 days; (c) it does not require that 
25 restriction sites for assembly by ligation be included in 
gene design, hence no unnecessary mutations are introduced; 
(d) it enables rapid inclusion of degenerate oligonucleotide 
regions if desired, without separate assembly or cloning 
reactions; and (e) it enables the assembly of chimeric genes 
30 without the introduction of mutagenic restriction sites, 
i.e., it enables "perfect" promoter-gene fusions. 

The present invention further provides improvements in 
the nutritional value of edible organisms, including, but 
not limited to, higher plants. In particular, the present 
35 invention provides for the assembly of synthetic oligo- 
nucleotides by means of overlapping sequences, including the 
nucleic acid sequences encoding the lysine-rich proteins. 

In one embodiment, the present invention provides 



- 5 - 



WO 94/10315 s PCT/ US93/ 10090 

nucleic acids in the form of a DNA molecule, which encode 
one or more subunits of a lysine-rich (approximately 14%) 2S 
seed storage-type protein. Other isoforms will be at least 
about 80% homologous at the amino acid sequence level to 
5 this representative member, preferably at least about 85% 
homologous, and more preferably at least about 90% 
homologous . 

In a further embodiment, the present invention provides 
a cell comprising a replicon containing the chemically- 

10 synthesized, lysine-rich 2S storage protein combined with a 
promoter which includes regulatory sequences that provide 
for the expression of said protein in said cell, said 
subunit being heterologous to said cell. in particularly 
preferred embodiments, the cellular host is a higher plant 

15 or animal cell. 



Brief Description of the Figures 

Figure 1 shows the complete nucleic acid sequence of a 

20 2S seed storage protein with increased lysine content. The 
double-stranded molecule is cleaved with restriction enzymes 
(PstI and EcoRI) at bases as indicated to allow cloning. 

Amino acid residues are numbered beneath the sequence. 
Mature protein is comprised of residues 39-74 (small 

25 subunit) crosslinked via S-S bonds to residues 85-170 (large 
subunit). Residues 1-38 constitute a signal sequence and N- 
terminus processed site. Residues 75-84 constitute a 
"linker" type peptide, which is excised as the protein 
folds. Residue 171 is a carboxy-terminal residue which is 

30 also excised at protein maturity. 

Figure 2 shows the oligonucleotides used in construc- 
tion of the 2S seed protein and their design as pairs shar- 
ing complementary overlap regions of 17-24 nucleotides. 
Each pair has a similar overlap with the adjacent pair. 

35 Figure 3 shows the first and second sequential 

extension products that are formed as the six extensions are 
implemented. 

Figure 4 shows the third, fourth, fifth, and sixth 
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sequential extension products that are formed as the six 
extensions are completed. 

Disclosure of the Invention 

5 

In addition to the techniques described below, the 
practice of the present invention will employ conventional 
techniques of molecular biology, microbiology, recombinant 
DNA technology, and plant science, all of which is within 

10 the skill of the art. Such techniques are explained fully 
in the literature. See, e.g., Maniatis et al., Molecular 
Cloning: A Laboratory Manual (1982); DNA Cloning: Volume 
I and II (D.N. Glover, ed., 1985); Oligonucleotide Synthesis 
(M.J, Gait, ed., 1984): Nucleic Acid Hybridization (B.D. 

15 Hames & S.J. Higgins, eds., 1985); Transcription and Trans - 
lation (B.D. Hames & S.J. Higgins, eds, 1984): Animal Cell 
Culture (R.I. Freshney, ed., 1986); Plant Cell Culture (R.A. 
Dixon, ed., 1985); Propagation of Higher Plants Through 
Tissue Culture (K.W. Hughes et al., eds., 1978); Cell 

20 Culture and Somatic Cell Genetics of Plants (I.K. vasil, 
ed., 1984); Fraley et al. ( 1986) CRC Critical Reviews in 
Plant Sciences 4, 1; Biotechnology in Agricultural 

Chemistry: ACS Symposium Series 334 (LeBaron et al. eds. 
1987) the disclosures of which are well-known and are hereby 

25 incorporated herein by reference. 

The design of the prototypical synthetic plant gene 
herein is based on published regulatory sequences, including 
reported enhancer (repetitive) regions found uniquely in 
seed storage genes. In addition, computer modeling of both 

30 hydropathy and evolutionary relatedness of known seed 
proteins was used in the planning of potential coding 
sequences, as well as inclusion of codon biases found in 
published storage protein gene sequences. 

With respect to the choice of the regions to be modi- 

35 fied, the present invention varies significantly from other 
work which has been done in this field. European Patent 
Application No. 318, 341 describes a method for replacement 
or supplementation of the hypervariable region of a 2S 
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albumin gene. Based on a model of the Arabidopsis thaliana 
2S albumin, the hypervar iable region is defined as a section 
of the large subunit of the protein between the sixth and 
seventh cysteine residues where little conservation of amino 
5 acids is observed. A non-conserved region is a region 
wherein the nucleotide sequence can be modified either by 
insertion into it or replacement of a nucleotide sequence 
which, at least in part, may be foreign to the natural 
nucleic acid encoding the precursor of the 2S albumins of 

10 the plant cells concerned and encodes the appropriate amino 
acids, without disturbing the stability and correct process- 
ing of the storage protein or its transport into parts of 
the cell. The modification procedure is called site- 
directed mutagenesis. 

15 The synthetic gene sequence was constructed by the 

general process of sequential extension of overlapping 3' 
ends using DNA polymerase. The sequence was designed to be 
assembled from six pairs of synthetic oligonucleotides 
(partial sequences), each having 3' overlap within the pair, 

20 as well as 3 r overlap between adjacent pairs. Assembly is 
comprised of three parts: filling in pairs to create double- 
stranded segments; combining all duplexed segments and 
sequentially extending to form a small number of full length 
genes; and amplifying (PCR) complete molecules to a quantity 

25 sufficient for cloning. It is an efficient and streamlined 
procedure, useful for constructing large genes with little 
or no possibility of misjoinder and without the need for 
intermediate vectors. Numerous pairs of partial sequences 
can be used to assemble large synthetic genes. There is no 

30 limit to the size of predetermined gene structure that this 
synthetic strategy will allow. Accordingly, it is 

anticipated that this invention will find important 
utilization by those skilled in the art. 

In one embodiment, each pair is filled in by combining 

35 two (paired) oligonucleotides (100 pmol each) in a suitable 
solution for bonding, comprising 15 /iM each dNTP, 40mM Tris- 
Cl pH7,5, 20mM MgCl 2 , 50 mM NaCl, and lOmM DTT (25 ul final 
volume). The oligonucleotide mix is heat denatured (95°C) 
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and allowed to anneal by slowly cooling to room temperature. 
Heat-sensitive DNA polymerase (examples: E. coli "Klenow" , 
Sequenase {Registered: US Biochemical}) is added (1.5 U) and 
the reaction allowed to proceed 10 minutes at room 
5 temperature. Alternatively, heat-stable polymerase (e.g., 
Taq polymerase) may be substituted if the buffer is replaced 
with 50mM KC1 , lOmM Tris-Cl pH8 . 3 , 1 . 5mM MgCl 2 , .01% BSA, 
and the reaction mix annealed at 55°C and extended at 72°C. 
Sequential extension of these pairs is accomplished by 

10 combining aliquots of each of the above reactions, adding 
sufficient dNTPs, and sequentially heating, reannealing, and 
extending in the presence of polymerase. This is easily 
accomplished using Taq polymerase and commercially available 
heat cycling blocks (e.g., DNA Thermal Cycler (Perkin- 

15 Elmer/Cetus} ) , and requires buffer adjustment as noted 
above. Heat-labile polymerase may be substituted, but 
requires manual transfer of tubes between heat blocks of 
suitable temperature. The number of cycles required to 
generate full length sequences is dependent on the number of 

20 duplexed components, and is minimally half that number. To 
generate sufficient full length molecules to allow gel 
detection, the molecules must be cycled a greater number of 
times. In the example from the previous paragraph, the 
partial sequences were sequentially extended for a total of 

25 12 cycles in order to discern full length molecules. 
Obtaining a clonable amount of this gene sequence is 
possible using PCR, and requires only a small portion (2%) 
of the sequential extension reaction as template. 

30 

Modes for carrying out the Invention 
Example 1 

35 Design of the protein 

A putative 2S seed storage protein sequence was derived 
from published protein sequences, Crouch, et al. (1983) 



WO 94/10315 s ~n PCT/US93/ 10090 

Mol. Appl. Gen. 2, 273; Ericson, et al- (1986) J . Biol. 
Chem. 261, 14576 ; Altenbach, et al. ( 1987) Plant Hoi. Biol. 
8, 239; Krebbers, et al. (1988) Plant Physiol. 87, 859), 
and by using peptide sequence data from various Brassica 
5 spp. obtained in this laboratory (unpublished). These 
members of the 2S class of seed storage proteins are 
synthesized as precursor polypeptides of 15-21 kDa and 
undergo a number of processing steps to yield the stored 
protein, comprised of a large and a small subunit of 

10 combined MW of 9-17 kDa. The proposed protein sequence 
(Figure 1) includes all processing regions typical of such 
2S seed proteins. The first 22 amino acids should function 
as a transit peptide to direct protein inclusion in storage 
bodies (Chrispeels, et al. 1982 J. Cell Biol. 93:306). In 

15 addition to the first 22 amino acids, residues 23-38, 75-84, 
and 171 are those amino acids which should be deleted in the 
final stored product by processing steps typical of these 2S 
seed proteins. The accumulated protein should thus be two 
subunits of 4.4 kDa (residues 39-74) and 9.7 kDa (residues 

20 85-170). Codons were selected for the synthetic gene based 
on observed codon biases in seed storage proteins (data not 
shown) . 

Example 2 

25 

Synthesis of oligonucleotides 

Oligonucleotides from 56 to 69 nucleotides in length 
were synthesized on an Applied Biosystems Model 380B 

30 synthesizer, deblocked, treated with ammonia at 50°C, 
vacuum-dried and resuspended in water. The oligonucleotides 
were used with no further purification. 

Oligonucleotides used in this construction were 
designed as pairs sharing complementary overlap regions of 

35 17-24 nucleotides, each pair having a similar overlap with 
the adjacent pair (Figure 2). Following denaturation and 
annealing with all oligomers present in the reaction, mole- 
cules of the most stable duplex structure formed, and 
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allowed extension of the duplex from the overlaps. Repeti- 
tion of such extensions produced successively longer mole- 
cules, hence progressively larger regions of complementa- 
tion. Sequential extension products are shown schematically 
5 in Figure 3. The first extension reaction can yield only 
those products shown, and required polymerase fill-ins of 
37-51 nucleotides from overlap regions of 17-24 base pairs 
in the claimed synthetic gene. The second round of exten- 
sion must also proceed from minimal overlaps (17-18 base 

10 pairs), with the addition of 79-102 nucleotides to the com- 
plementary regions. Beginning with the third extension, 
progressively larger overlaps were available. Only the 

longest, hence most stable duplex conformations, are shown 
in Figure 3. At the end of the third extension reaction 

15 some completed molecules were present in the reaction. A 
total of six extensions increased the probability of obtain- 
ing complete sequences. 

Example 3 

20 

Amplification 

An aliquot of the extension products served as a 
template for in vitro amplification using distal 5' and 3 r 
25 oligonucleotides (oligos 1+ and 6-) as primers. Both the 
Taq polymerase and the T7 DNA polymerase extension reactions 
yielded single Taq amplification products of the expected 
530 bp. 

30 Example 4 

Cloning and expression 

The amplification products of Example 3 were gel puri- 
35 fied, cut at the PstI and EcoRI sites included at the 5' and 
3' ends of the synthetic sequence, and cloned into similarly 
digested pTZl8u. Recombinant plasmids were transfected into 
DH5a and plated on selective media containing x-Gal. white 
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colonies were selected for mini-pr ps of DNA, and screened 
for the pres nee of the 206 bp Bgl2 fragment. Six of the 
Taq-extended clones and seven of the T7-extended clones were 
sequenced completely at least once in each direction, and 
5 the sequence analysis results are shown in Table 1. One of 
six clones from Taq extension and one of seven clones from 
T7 extension contained perfect constructs. The clones from 
the Taq extension contained a total of 10 induced single 
base pair mutations: 6 substitutions, 3 deletions and one 

10 insertion. The sum mutation rate with Taq extension was 
thus 10/(6x530) or 1 mutation per 318 nucleotides. T7 
extensions contained considerably more mutations, including 
10 substitutions, one insertion and 3 deletions of 2 , 3 and 
9 base pairs. The sum mutation rate with T7 polymerase 

15 extensions was thus 25/(7x530) or 1 mutation per 148 nucleo- 
tides . 

Mini plasmid preps used to screen for the Bgl2 fragment 
were digested with EcoRl and PstI , Southern blotted and 
examined by hybridization to a probe prepared from the com- 

20 plete insert of the correct synthetic gene clone, pTL315. 
It was found that of clones produced by Taq extension, only 
those possessing the Bglll fragment contained any portion of 
the synthetic gene. However, 24 clones obtained through T7 
extension contained some portion of the synthetic gene, and 

25 only six of these included the predicted Bglll fragment. 
More amplification products result from the T7 extension 
mixes than from those of Taq. It is likely that the lower 
temperature (37°) used for T7 extensions allowed more mis- 
matches during annealing and extension than that allowed 

30 during the Taq (72°) extensions. 
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Table 1 

Clones selected from sequential extensions 



Designation Enzyme used Mutation Location 3 

in extensions 







A->T 


OL(S) 


pTL314 


Taq 


G deletion 


Fl 


PTL315 


Taq 


none 




pTL332 


Taq 


A deletion 


0L(S) 


pTL333 


Taq 


A->G 


0L(F) 


pTL340 


Taq 


A deletion 


Fl 






C insertion 


Fl 


pTL344 


Taq 


G->T 


Fl 






A->C 


Fl 






CA->AC 


Fl 


-ml A 1 r\ 

pTI41U 


0«*"ri i t~i ^ 

oequenase 


UcIC LiUI 1 








T->C 


0L(S) 


pTL414 


Sequenase 


1— SVj 


Pi 






A insertion 


Fl 


pTL423 


Sequenase 


P— ST 
L.— X 1 


Fl 






1— NO 


OT.f F ) 
\ju\ r ; 


pTL459 


Sequenase 


none 




pTL478 


Sequenase 


GTG deletion 


Fl 






T->C 


Fl 


pTL652 


Sequenase 


A->G 


Fl 






G->A 


Fl 


pTL657 


Sequenase 


9 bp deletion 


Fl 






A->G 


OL(S) 






T->C 


Fl 






C->A 


Fl 



0L(S) :overlap during first sequential extension 
a 0L(F) : overlap during paired oligonucleotide fill-in 
Fl: fill-in region during either of the above reactions 
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Industrial Applicability 

Directly or indirectly, animals obtain their essential 
amino acids (those they are unable to synthesize) from eat- 
5 ing plants- Most seeds, the major plant protein sources, 
are deficient in one or more amino acids essential for 
proper nutrition of higher animals. Dicotyledonous seeds, 
such as legumes, generally lack sufficient sulfur-containing 
amino acids (cysteine and methionine), while monocotyle- 
10 donous plants (cereals) typically lack adequate lysine, as 
well as tryptophan and threonine. Plants can serve as ade- 
quate amino acid sources if complementary seeds (e.g., rice 
and beans) are ingested simultaneously, and in the proper 
quantity. 

15 Cereals and legumes are combined in this complementary 

way in the formulation of diets for swine. Current feeding 
practices in the United States utilize 85% corn and 15% 
soybean meal in swine diets. The predominance of corn as 
the major dietary component is due mainly to its low cost 

20 and high carbohydrate content. The low protein levels are 
supplemented with soybean meal to provide adequate protein 
nutrition. Because corn is particularly deficient in lysine 
(2%), added soybean, although sufficient in lysine (6.4%) 
when used as the sole protein source, cannot raise lysine 

25 levels to those necessary for maximum swine growth. Thus 
swine feed is frequently supplemented with "synthetic" 
lysine. Current levels of supplemental lysine average about 
lkg per metric ton of feed at a cost of $4.50/kg lysine. 
The U.S. market for lysine (primarily used in feeds) is 

30 20Mkg, resulting in retail sales of $100M. Strategies to 
reduce this supplementation of lysine include the use of 
newly developed high-lysine (3.3%) corn varieties. These 
varieties may obviate the need for lysine addition to feed 
in the future. However high-lysine varieties have not yet 

35 been widely accepted by farmers, because they typically show 
poor growth and low yield characteristics. Additionally, 
existing high-lysine corn lines are the result of a 
recessive mutation, which increases the difficulty of 
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breeding this characteristic into popular varieties. 
Therefore, these varieties of corn are an expensive source 
of high-lysine protein. 

A reasonable alternative is to enhance lysine levels in 
5 corn, soybean, and other crops through introduction of new 
seed storage protein genes. For example, soymeal is a 
component of animal feeds because of its high protein 
quality and content. A modest increase in soy protein 
lysine levels may be of great benefit to the feed market due 
10 to the high quality protein background in soybean. 
Molecular biology now provides the tools to alter amino acid 
composition via gene transfer and provide, through this 
invention, for the nutritional enhancement of soybeans and 
other crops. 

15 
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Sequence Listing 

(1) GENERAL INFORMATION: 
5 (i) APPLICANT: Barbara Ballo 

(ii) TITLE OF INVENTION: Process for Enhancing the 
^ Content of a Selected Amino Acid in a Seed Storage 

Protein 

| (iii.) NUMBER OF SEQUENCES: 13 

4 10 (iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pioneer Hi-Bred International, 

inc. 

(B) STREET: 700 Capital Square 

400 Locust Street 
15 (C) CITY: Des Moines 

(D) STATE: Iowa 

(E) COUNTRY: United States 

(F) ZIP: 50309 
% (v) COMPUTER READABLE FORM: 

20 (A) MEDIUM TYPE: Diskette — 3.5 inch, ' 

720 kb storage 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: MS-DOS 

(D) SOFTWARE: WORDPERFECT 
25 (vi) CURRENT APPLICATION DATE: 

(A) APPLICATION NO. 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
30 (A) NAME: Pearlmutter, Nina L. 

(B) REGISTRATION NUMBER: 35,639 

(C) REFERENCE/DOCKET NUMBER: 0215 US 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (515) 245-3596 

35 (B) TELEFAX: (515) 245-3634 
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(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 533 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ' ANTI-SENSE: N/A 

10 (xi) SEQUENCE DESCRI PTION : Seq . ID. No. 1 
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TAACTGCAG ATG GCA AAC ATT TCT GTG GTT GCT GCT GCA CTA CTG GTC 48 
Met Ala Asn He Ser Val Val Ala Ala Ala Leu Leu Val 
1 5 10 

5 

TTC CTG GTG TTG GGT CAT GCC ACT GCA AGC ATC TAC AGG ACA GTT GTG 96 
Leu Leu Val Leu Gly His Ala Thr Ala Ser He Tyr Arg Thr Val Val 
15 20 25 

10 GAG TTT GAA GAG GAT GAT GCC ACC AAC CCA ATA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys Met Arg 
30 35 40 45 

AAA TGC AGA AAG GAG TTC CAG AAG GAA CAA ATG TTG AGA GCT TGC CAA 192 
15 Lys Cys Arg Lys Glu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
50 55 60 65 

CAA TGG TTG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TTT GAC 240 
Gin Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
20 70 " ' 75 80 85 

TTT GAA GAT GAC ATG GAG AAT CCT CAA GGA CCA CAG CAG AGA CCT CCT 288 
Phe Glu Asp Asp Met Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 



25 



45 



CTC CTT CAG AAG TGC TGT GAG CAA CTC AAA CAG ATG CAA TCT CAG TGT 336 
Leu Leu Gin Lys Cys Cys Glu Gin Leu Lys Gin Met Gin Ser Gin Cys 
105 110 115 



30 GTT TGC CCA ACC CTT AAA GGT GCC AGC AAA GCT GTG AAA CAG GAA GAG 384 
Val Cys Pro Thr Leu Lys Gly Ala Ser Lys Ala Val Lys Gin Glu Glu 
120 125 130 

CAG CAA CAA GGC CAG CAA CAA GGT AAG CAG CAG ATG GTT AGG AAG ATC 432 
35 Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Met Val Arg Lys He 
135 140 145 

TAT AAG ACT GCC AAA CAC CTT CCT AAA GTC TGT GAC ATT CCA CAG GTT 480 
Tyr Lys Thr Ala Lys His Leu Pro Lys Val Cys Asp He Pro Gin Val 
40 150 155 " 160 165 

GAT GTA TGC CCA TTT CAG AAG ACC ATG CCT GGG CCC TCA TAC TAGAATT 529 
Asp Val Cys Pro Phe Gin Lys Thr Met Pro Gly Pro Ser Tyr *** 
170 175 



CAAT 533 
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(3) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 2 

TAACTGCAGA TGGCAAACAT TTCTCTGGTT GCTGCTGCAC TACTGGTCTT GCTGGTGTTG 60 
15 GGTCATGCC 69 

(4) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 69 bases 

20 (B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 3 

GGTGGCATCA TCCTCTTCAA ACTCCACAAC TGTCCTGTAG ATGCTTGCAG TGGCATGACC 60 
30 CAACACCAG 69 



3 
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(5) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 bases 

(B) TYPE: nucleotide 
5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 
10 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 4 

GAAGAGGATG ATGCCACCAA CCCAATAGGT CCTAAGATGA GGAAATGCAG AAAGGAG 57 

(6) INFORMATICS FOR SEQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 bases 

(B) TYPE: nucleotide 

^ 20 (C) STRANDEDNESS: single 

| (D) TOPOLOGY: linear 

* (ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 

4 25 (xi) SEQUENCE DESCRIPTION: 

I 

Seq. ID. No. 5 

CCATTGTTGG CAAGCTCTCA ACATTTGTTC CTTCTGGAAC TCCTTTCTGC ATTTCC 56 

30 



15 



4 
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(7) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 bases 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 6 

GAGCTTGCCA ACAATGGTTG AGGAAACAAG CTAGACAAGG AAGATCTGAT GAATTTGAC 59 

(8) INFORMATICS FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 61 bases 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 



Seq. ID. No. 7 

GGTCTCTGCT GTGGTCCTTG AGGATTCTCC ATGTCATCTT CAAAGTCAAA TTCATCAGAT 60 
C 61 
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(9) INFORMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 bases 
j (B) TYPE: nucleotide 

j 5 (C) STRANDEDNESS : single 

3 (D) TOPOLOGY: linear 

I (ii) MOLECULE TYPE: synthetic DNA 

J (iii) HYPOTHETICAL: No 

1 (iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

j Seq. ID. No. 8 

GGACCACAGC AGAGACCTCC TCTCCTTCAG AAGTGCTGTG AGCAACTCAA ACAGATG 57 

* 15 

(10) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 64 bases 

: (B) TYPE: nucleotide 

■ j 20 (C) STRANDEDNESS: single 

1 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 
25 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 9 

CAGCTTTGCT GGCACCTTTA AGGGTTGGGC AAACACACTG AGATTGCATC TGTTTGAGTT 60 

30 

GCTC 64 



-i 



1 

i 
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(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 10 

AAGGTGCCAG CAAAGCTGTG AAACAGGAAG AGCAGCAACA AGGCCAGCAA CAAGGTAAGC 60 
15 AGCAG 65 

(12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 56 bases 

20 (B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 11 

30 GGAAGGTGTT TGGCAGTCTT ATAGATCTTC CTAACCATCT GCTGCTTACC TTGTTG 56 
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(13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI -SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 



15 



Seg. ID. No. 12 

GACTGCCAAA CACCTTCCTA AAGTCTGTGA CATTCCACAG GTTGATGTAT GCCCATTTC 



59 



(14) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 57 bases 

20 (B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 13 

30 ATTGAATTCT AGTATGAGGG CCCAGGCATG GTCTTCTGAA ATGGGCATAC ATCAACC 



57 



3S 
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WHAT IS CLAIMED IS: 

1. A method of making an improved seed storage protein 
by altering a naturally-occurring seed storage protein hav- 
5 ing a known amino acid sequence to increase its content of a 
selected amino acid, comprising the steps of: 

a. identifying conserved, non-conserved and hyper- 
variable residues in the amino acid sequence of the 

10 naturally-occurring protein by comparison of the amino acid 
sequence of the protein with amino acid sequences of other 
homologous seed storage proteins; and 

b. replacing one or more non-conserved DNA 
15 residues coding for the protein with DNA residues coding for 

the selected amino acid, provided that 

i) the replacement is conservative with 
respect to hydrophobici ty , polarity and charge, and 
20 ii) the replacement does not create any pairs 

of adjacent amino acids which are not found in the 
naturally-occurring seed storage protein or the homologous 
seed storage proteins. 

25 2. A method according to claim 1 comprising the further 

steps of synthesizing a DNA sequence which codes for the 
altered seed storage protein and synthesizing the altered 
seed storage protein by transcription and translation of the 
DNA sequence in a living cell. 

30 

3. A method according to claim 2 wherein the DNA 
sequence is synthesized by site-directed mutagenesis of a 
DNA sequence which codes for the naturally-occurring seed 
storage protein. 

35 

4. A method according to Claim 2 wherein the DNA 
sequence which codes for the naturally-occurring seed 
storage protein is genomic DNA. 
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5. A method according to Claim 3 v:herein the DNA 
sequence which codes for the naturally-occurring seed 
storage protein is genomic DNA. 

5 

6. A method according to Claim 2 wherein the DNA 
sequence is: SEQ ID N0:1 

or; a DNA sequence at least 80% homologous thereto. 

10 

7. A method according to Claim 3 wherein the DNA 
sequence is: SEQ ID NO:l 

or; a DNA sequence at least 80% homologous thereto. 

15 

8. A method according to Claim 2 wherein the DNA 
sequence is synthesized by the steps of: 

a, synthesizing a set of single-stranded partial 
20 DNA sequences capable of being assembled in complementary 
overlapping relationship to provide the complete DNA 
sequence of the altered protein, each partial sequence hav- 
ing a length of less than about 100 base pairs, each partial 
sequence having 3' and 5' oligonucleotide ends which are 
25 complementary to the respective 3' and 5' oligonucleotide 
ends of the partial sequences which are respectively 3' and 
5' to the partial sequence in the complete sequence of the 
altered protein; and 

30 b. annealing the partial sequences to produce 

extended sequences consisting of two or more partial 
sequences in complementary overlapping relationship? 

c. filling nucleotide gaps in the extended 
35 sequences to produce double-stranded extended sequences; 
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d. denaturing the double-stranded extended 

sequences to produce longer sequences consisting of two or 
more partial sequences; and 

5 e. repeating steps (b) through (d) until the 

extended sequence produced by step (c) is the complete DNA 
sequence of the altered protein. 

9. A method of synthesizing a complete DNA sequence 
10 comprising the steps of: 

a. synthesizing a set of single-stranded partial 
DNA sequences capable of being assembled in complementary 
overlapping relationship to provide the complete DNA 
15 sequence, each partial sequence having 3' and 5' ends which 
are complementary to the respective 3' and 5' ends of the 
partial sequences which are respectively 3 r and 5' to the 
partial sequence in the complete sequence; and 

20 b. annealing the partial sequences to produce 

extended sequences consisting of two or more partial 
sequences in complementary overlapping relationship; 

c. filling nucleotide gaps in the extended 
25 sequences to produce double-stranded extended sequences? 

d. denaturing the double-stranded extended 
sequences to produce longer sequences consisting of two or 
more partial sequences; and 



30 



e. repeating steps (b) through (d) until 
the extended sequence produced by step c is the complete DNA 
sequence . 
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PstI 

TAACTGCAG ATG GCA AAC ATT TCT GTG GTT GCT GCT GCA CTA CTG GTC 48 
Met Ala Asn He Ser Val Val Ala Ala Ala Leu Leu Val 
15 10 

TTG CTG GTG TTG GGT CAT GCC ACT GCA AGC ATC TAC AGG ACA GTT GTG 96 
Leu Leu Val Leu Gly His Ala Thr Ala Ser He Tyr Arg Thr Val Val 
15 * 20 25 

GAG TTT GAA GAG GAT GAT GCC ACC AAC CCA ATA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys Met Arg 
30 35 40 45 

AAA TGC AGA AAG GAG TTC CAG AAG GAA CAA ATG TTG AGA GCT TGC CAA 192 
Lys Cys Arg Lys Glu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
50 ~ 55 ' 60 65 

CAA TGG TTG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TTT GAC 240 
Gin Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
70 75 80 85 

TTT GAA GAT GAC ATG GAG AAT CCT CAA GGA CCA CAG CAG AGA CCT CCT 288 
Phe Glu Asp Asp Met Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 

CTC CTT CAG AAG TGC TGT GAG CAA CTC AAA CAG ATG CAA TCT CAG TGT 336 
Leu Leu Gin Lys Cys Cys Glu Gin Leu Lys Gin Met Gin Ser Gin Cys 
105 * 110 115 

GTT TGC CCA ACC CTT AAA GGT GCC AGC AAA GCT GTG AAA CAG GAA GAG 384 
Val Cys Pro Thr Leu Lys Gly Ala Ser Lys Ala Val Lys Gin Glu Glu 
120 125 130 

CAG CAA CAA GGC CAG CAA CAA GGT AAG CAG CAG ATG GTT AGG AAG ATC 432 
Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Met Val Arg Lys He 
135 - 140 145 

TAT AAG ACT GCC AAA CAC CTT CCT AAA GTC TGT GAC ATT CCA CAG GTT 480 
Tyr Lys Thr Ala Lys His Leu Pro Lys Val Cys Asp lie Pro Gin Val 
150 155 160 165 

EcoRI 

GAT GTA TGC CCA TTT CAG AAG ACC ATG CCT GGG CCC TCA TAC TAGAATT 529 
Asp Val Cys Pro Phe Gin Lys Thr Met Pro Gly Pro Ser Tyr *** 
170 175 

CAAT 533 

FIGURE 1 
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TAACTGCAGATGGCAAACATTTCTGGTTC 

GACCACAACCCAGTACGGT 



GAAGAGGATGATGCCACCAACCCAATAGGTCCTAAGAT 



GAGGAAATGCAGAAAGGAG GAGCTTGCCAACAATGGTTGAGGAAACAA 
CCTTTACGTCTTTCCTCAAGGTCTTCCTTC 



GCTAGACAAGGAAGATCTGATGAATTTGAC GGACCACAGCAGA 
CTAGACTACTTAAACTCAAACTTCTACTGTACCTCTTA 



GACCTCCTCTCCTTCAGAAGTGCTGTGAGCAACTCAAACAGATG 

CTGG CTCGTTGAGTTTGTCTACGTTAGAGTCACACAAACGGGTO 



AAGGTGCCAGCAAAGCTGTGAAACAGGAAGAGCAGCAACAAGGCC^ 
ATTTCCACGGTCGTTTCGAC GTTGTTCCGGTCGTTGTTCCATTCGTCGTCTAC 



GAC1XX!CAAACACCITCCTAAAGTCTGTGAC^TTCC^ 
CAATCCTTCTAGATATTCTGACGGTTTGTGGAAGG CCAACTACATACGG 



CATTTC 

GTAAAGTCTTCTGGTACGGACCCGGGAGTATGATCTT^ 
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Sequential extension from overlap regions of oligonucleotides: 

First extension 

> Oligo pair 1+/1- 

< 

> Oligo pair 2+/2- 

< 

> Oligo pair 3+/3- 

< 

> Oligo pair 4+/4- 

< 

— -> Oligo pair 5+/5- 

< 

> oligo pair 6+/6- 

< 



Second extension 

> Oligo pairs 1+2 

> oligo pairs 2+3 

> oligo pairs 3 + 4 

> Oligo pairs 4 + 5 

> oligo pairs 5 
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-> Oligo pairs 1+2+3 

> Oligo pairs 2 + 3 + 4 

> Oligo pairs 3 + 4 + 5 

> Oligo pairs 4 + 5 + 6 



Fourth extension 



-> Oligo pairs 1+2+3+4 

> Oligo pairs 2+3+4+5 

>Oligo pairs 3+4+5+6 



Fifth extension 



Sixth extension 



-> Oligo pairs 1-5 
> oligo pairs 2-6 
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Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 
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Slovakia 


CM 
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